KMID : 1132720200180030033
|
|
Genomics & Informatics 2020 Volume.18 No. 3 p.33 ~ p.33
|
|
Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0
|
|
Kim Sun-Ho
Kim Ro-Young Nam Hee-Jo Kim Ryeo-Gyeong Ko En-Jin Kim Han-Su Shin Ji-Hye Cho Da-Eun Jin Yu-Rhee Bae So-Yeon Jo Ye-Won Jeong San-Ah Kim Ye-Na Ahn Seo-Yeon Jang Bo-Mi Seong Ji-Heyon Lee Yu-Jin Seo Si-Eun Kim Yu-Jin Kim Ha-Jeong Kim Hye-Ji Sung Hye-Lynn Lho Hyo-Young Koo Jay-Won Chu Ji-On Lim Ju-Won Kim Young-Ju Lee Kyung-Yeon Lim Yu-Ri Kim Meong-Eun Hwang Seon-Jeong Han Shin-Hye Bae So-Hyeun Kim Su-A Yoo Su-Hyeon Seo Yeon-Jeong Shin Ye-Rim Kim Yon-Soo Ko You-Jung Baek Ji-Hee Hyun Hye-Jin Choi Hye-Min Oh Ji-Hye Kim Da-Young Park Hyun-Seok
|
|
Abstract
|
|
|
This paper describes a community effort to improve earlier versions of the full-text corpus of Genomics & Informatics by semi-automatically detecting and correcting PDF-to-text conversion errors and optical character recognition errors during the first hackathon of Genomics & Informatics Annotation Hackathon (GIAH) event. Extracting text from multi-column biomedical documents such as Genomics & Informatics is known to be notoriously difficult. The hackathon was piloted as part of a coding competition of the ELTEC College of Engineering at Ewha Womans University in order to enable researchers and students to create or annotate their own versions of the Genomics & Informatics corpus, to gain and create knowledge about corpus linguistics, and simultaneously to acquire tangible and transferable skills. The proposed projects during the hackathon harness an internal database containing different versions of the corpus and annotations.
|
|
KEYWORD
|
|
biomedical text mining, corpus, text analytics
|
|
FullTexts / Linksout information
|
|
|
|
Listed journal information
|
|
|